A Geographic Directed Preferential Internet Topology Model 



Sagy Bar* Mira Goneri* Avishai Wool* 



February 1, 2008 



Abstract 

The goal of this work is to model the peering arrangements between Autonomous Systems (Ares). 
Most existing models of the AS-graph assume an undirected graph. However, peering arrangements 
are mostly asymmetric Customer-Provider arrangements, which are better modeled as directed edges. 
Furthermore, it is well known that the AS-graph, and in particular its clustering structure, is influenced 
by geography. 

We introduce a new model that describes the AS-graph as a directed graph, with an edge going 
from the customer to the provider, but also models symmetric peer-to-peer arrangements, and takes 
geography into account. We are able to mathematically analyze its power-law exponent and number of 
leaves. Beyond the analysis, we have implemented our model as a synthetic network generator we call 
GdTang. Experimentation with GdTang shows that the networks it produces are more realistic than 
those generated by other network generators, in terms of its power-law exponent, fractions of customer- 
provider and symmetric peering arrangements, and the size of its dense core. We believe that our model 
is the first to manifest realistic regional dense cores that have a clear geographic flavor. Our synthetic 
networks also exhibit path inflation effects that are similar to those observed in the real AS graph. 



1 Introduction 

1.1 Background and Motivation 

The connectivity of the Internet crucially depends on the relationships between thousands of Autonomous 
Systems (Ares) that exchange routing information using the Border Gateway Protocol (VP). These relation- 
ships can be modeled as a graph, called the AS-graph, in which the vertices model the Ares, and the edges 
model the peering arrangements between the Ares. 

Significant progress has been made in the study of the AS-graph's topology over the last few years. In 
particular, it is now known that the distribution of vertex degrees (i.e., the number of peers that an AS has) 
observed in the AS-graph is heavy-tailed and obeys so-called power-laws [ SFFF03 1 : The fraction of vertices 
with degree k is proportional to &;~ 7 for some fixed constant 7. This phenomenon cannot be explained by 
traditional random network models such as the Erdos-Renyi model [ER60|. 
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1.2 Modeling Principles for the AS-graph 



1.2.1 Direction Awareness 

Peering arrangements between ASes are not all the same lcCG-M)2llGaoOTllDJMS()3ll . Gao IGaoOll shows 
that 90.5% of the peering arrangements have a Customer-Provider nature. This is a commercial arrangement: 
the provider sells connectivity to the customer. In such a peering arrangement the provider allows transit 
traffic for its customers, but a customer does not allow transit traffic between two of its providers. This 
asymmetry is much better modeled by a directed graph, with edges going from the customer to the provider. 
However, according to Gao [GaoOl ] about 8% of the peering arrangements have a symmetric peer-to-peer 
nature, and these arrangements need to be modeled as well. Conveniently, symmetric peering arrangements 
can be modeled within a directed graph as a pair of anti-parallel directed edges. 

The above observations have some important effects on the process by which the AS-graph evolves, 
effects which should be taken into account in a model: 

1. When a new peering arrangement is formed, it is the customer that chooses the provider. 

2. A rational customer will choose a provider offering the best utility - which means, among other 
factors, the provider offering the best connectivity. We argue that a provider with many uplinks (i.e., 
an AS that is a customer to many upstream providers) offers better connectivity to its own customers, 
and is therefore a more attractive peer. 

3. An existing AS's decision to set up a new peering arrangement, with an additional provider, is influ- 
enced by the number of customers the AS already has. We argue that an AS that has many downstream 
customers is motivated to keep up with their connectivity demands, and consequently, is motivated to 
add upstream connectivity. 

4. The vast majority of arrangements are asymmetric. However, with a certain probability p, a new 
peering arrangement will be symmetric. 

1.2.2 Geographic Awareness 

The AS-graph structure is known to be influenced by geography ILBC M0lllBRCH()3llWSS()2IIBS()2llIf()2l 
ILC03I IGK03I1 . However, in all these works, (except for [LC03|), geography is modeled using Euclidean 
distances, by defining a coordinate system and attaching coordinates to each AS. We argue that it is difficult 
to meaningfully associate a point on the globe with an AS: Most ASes, and especially the large ones, cover 
large geographic areas - up to whole continents and more. 

We take a different approach to modeling AS-level geography. We observe that even though an AS is not 
located in one point, most ASes do have a national character [ CAI04I - which can be inferred, for example, 
from the contact address listed in the BGP administrative data. Therefore, to model the effects of geography, 
we associate a region with each AS in the model. When an edge is added in our model, we control whether 
it is a local edge (both endpoints within the same region) or a global one (endpoints may be anywhere). 

We shall see that we are able to produce an evolution model of the AS-graph based on all the above 
considerations. We show that our model matches the reality of the AS-graph with surprisingly high accuracy, 
yet it remains amenable to mathematical analysis. 
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1.3 Related Work 



1.3.1 Undirected Models 

Barabasi and Albert [BA99| introduced a very appealing mathematical model to explain the power-law 
degree distribution (the BA model). The BA model is based on two mechanisms: (i) networks grow incre- 
mentally, by the adding new vertices, and (ii) new vertices attach preferentially to vertices that are already 
well connected. They showed, analytically, that these two mechanisms suffice to produce networks that are 
governed by a power-law. 

While the pure BA model [BA99| is extremely elegant, it does not accurately model the Internet's 
topology in several important aspects: 

• It produces undirected graphs, whereas the AS -graph is much better represented by a directed graph 
as discussed above. 

• The BA model does not produce any leaves 1 (vertices with degree 1), whereas in the real AS -graph 
some 30% of the vertices are leaves. 

• The BA model predicts a power law distribution with an exponent 7 = 3, whereas the real AS -graph 
has a power law with 7 « 2.22. This is actually a significant discrepancy: For instance, the most 
connected ASes in the AS graph have 500-2500 neighbors, while the BA model predicts maximal 
degrees which are roughly 10 times smaller on networks with comparable sizes. 

• It is kno wn that the Internet has a rather large dense core ISARK02I ISW041 IGMZ03I ITPSF01I 
ICEBH00I ICHK+Oll iRMHl ICL02I IMR951 INSW01I IDMSOIB : The AS graph has a core of I = 43 
ASes, with an edge density 2 g of over 70%. However, as recently shown by Sagie and Wool [SW03 1, 
the BA model is fundamentally unable to produce synthetic Internet topologies with a dense core 
larger than t = 6 with g(£) > 70%. In fact, ISW03I showed that BA topologies, including the the BA 
variants implemented by both BRITE [MLMB01 ] and Inet [WJ02], cannot even contain a 4-clique. 
This agrees with the findings of Zhou and Mondragon [ZM04]. 

These discrepancies, and especially the fact that the pure BA model produces an incorrect power law 
exponent 7 = 3, were observed before. Several models have been suggested to improve the BA model, 
in order to reduce the power-law exponent. However, most such models still describe the AS-graph as an 
undirected graph. 

Barabasi and Albert themselves refined their model in [AB00] to allow adding links to existing edges, 
and to allow rewiring existing links. However, as argued by Chen et al. IC CG + 02l . and by Bu and Towsley 
[BT02], the idea of link-rewiring seems inappropriate for the AS graph. Bu and Towsley [BT02] also 
suggested the Generalized Linear Preference model. In their model new vertices attach preferentially to 
existing vertices, but the preferential attachment linearly depends on the existing vertex degree minus a 
technical parameter f3. 

'in principle, the BA model can produce leaves if new nodes are born with m = 1 edges. However, setting m = 1 produces 
networks with average degree ~ 2 which is about half the value observed in the AS graph. 

2 The density g(£) of a subgraph with £ vertices is the fraction of the £(£ — l)/2 possible edges that exist in the subgraph. 
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Bianconi and Barabasi [BB01| improved the BA model by defining the Fitness Model, in which the 
preferential attachment dependents also on a per-node parameter r^. However, as shown by Zhou and 
Mondragon [ZM04], this model does not achieve a dense-core. 

Bar, Gonen, and Wool [BGW04| improved the BA model by defining the InEd model, in which m — 1 
out of the m added new edges connect existing nodes. Even though the InEd model is undirected, it is the 
stalling point of our work. 

1.3.2 Directed Models 

Pure directed models for the AS-graph have been suggested by Bollobas et al. [BBCR03], Aiello et al. 
[BBCR03|, and Krapivsky et al. [KRR01|. All of these models share the same basic approach for adding 
directed edges: a node is selected as the outgoing (customer) endpoint with a probability that is proportional 
to its out-degree; and a node is selected as the incoming (provider) endpoint with a probability proportional 
to its in-degree. All of these models produce a power-law distribution in both the in-degree and the out- 
degree. Nevertheless, we argue that their assumptions are hard to justify. If the probability of choosing 
an outgoing endpoint depends on the current out-degree, it means that an AS with many customers is seen 
as a desirable provider. Similarly, in their approach, an AS with many providers is motivated to add more 
providers. Since the real motivation of adding edges in the AS-graph is to improve the connectivity of the 
graph, we see no good reason why a node with an already large in-degree would be a desirable provider, we 
argue that it should be the other way around: An AS with many uplinks is a desirable provider. Similarly, it 
is not clear why a node with a large out-degree would be more inclined to increase its out-degree further. 

1.3.3 Geographic Models 

Several previous models considered geography: Ben-Avraham et al. [BRCH03| suggest a method for em- 
bedding graphs in Euclidean space. Their method connects nodes to their geographically closest neighbors, 
and thus it economizes on the total physical length of links. Lakhina et al. [LBCM03 1 explore the geograph- 
ical location of the Internet's physical structure. However, the location of equipment is not directly tied 
to the commercial links found in the AS-graph. Warren et al. [WSS02| suggest a lattice-based scale-free 
network, where nodes link to nearby neighbors on a lattice. Jost and Joy [JJ02| suggest a model where 
new nodes form links with other nodes of preferred distances, in particular shortest distances. Brunet and 
Sokolov [BS02| suggested a model where the probability of connecting two nodes depends on their degree 
and on the distance between them. All the above models consider geography based on Euclidean distances 
or the length of the shortest path between the nodes. Li and Chen I LC03I suggest a different non-Euclidean 
concept of geography. Their model is based on the BA model, with a local-world connectivity. However, 
their model gives a power-law distribution with the same (incorrect) exponent 7 = 3, as in the BA model. 
Our approach to geography is reminiscent of [LC03 ], since we do not attempt to use a Euclidean geography 
model. Instead we associate an AS with a region, and probabilistically designate edges as either local or 
global. 

1.3.4 Limitations and Bias in the AS graph 

The AS-graph itself is an imperfect model of the real state of BGP routing. Chen et al. ICCG + 02 1 point 
out that AS peering relationships observed in BGP data are not synonymous with physical links, that the 
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advertised data is incomplete, and that peering relationships are not all equivalent. Moreover, according to 
ICGJ+021 a significant number of existing AS connections remain hidden from most BGP routing tables, 
and that there are about 25-50% more AS connections in the Internet than commonly used BGP- derived 
AS maps reveal. A critique of pure degree-based network generators appears in [TGJ + 02|, which claims 
that such synthetic networks mis-represent hierarchical features of the Internet structure. Willinger at el. 
llWGJ + 02l claim that the proposed criticality-based models fail to explain why such scaling behavior arises 
in the Internet. 

Lakhina et al. IILBCX03I claim that a power-law degree distribution may be an artifact of the BGP data 
collection procedure, which may be biased. They suggest that although the observed degree distribution 
of the AS-graph follows a power-law distribution, the degree distribution of the real AS-graph might be 
completely different. Thus, our view of reality may be inaccurate. Clauset and Moore ([CM04a|, [CM04b|) 
proved analytically the numerical work of Lakhina et al. However, Petermann and De Los Rios [ PR04 1 
showed that in the case of a single source the exponent obtained for the power-law distributions in the BA 
model is only slightly under-estimated. 

Obviously, we cannot model data that is unknown. Therefore, we measure our model's success against 
what is known about the AS-graph, assuming that this information is indicative (even though it may be 
biased). 

Finally, we believe that besides its inherent interest, modeling the AS-graph, despite its shortcomings, 
is an important practical goal. The reason is that with more accurate topology models, we can build more 
accurate synthetic network topology generators. Topology generators are widely used whenever one wishes 
to evaluate any type of Internet-wide phenomenon that depends on BGP routing policies. A few recent 
examples include testing the survivability of the Internet [AJB00 DJMS03|, comparing methods of defense 
against Denial of Service (DoS) attacks IWLC04I , and suggesting new methods for combating source IP ad- 
dress spoofing [L PS04I . Unfortunately, the most popular topology generators currently used in such studies 
(BRITE [MLMB01 1 and Inet [WJ02|) are based on the the BA model, which is known to be inaccurate in 
several key features. We hope that our model, and our GdTang network generator, will make such studies 
more accurate and reliable. 

1.4 Contributions 

Our main contribution is a new model that has the following features: 

• It describes the AS-graph as a directed graph, which models both customer-provider and symmetric 
peering arrangements. 

• It produces networks which accurately model the AS-graph with respect to: (i) value of the power 
law exponent 7, (ii) the size of the dense core, (iii) the number of customer-provider links, and (iv) 
the number of leaves. In fact, it significantly improves upon all existing models we are aware of, with 
respect to all these parameters. 

• It includes a simple notion of geography that, for the first time, produces networks with accurate 
Regional Cores - secondary dense clusters that are local to a geographic region. 

• Our networks exhibit realistic path inflation effects. 
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• It is natural and intuitive, and follows documented and well understood phenomena of the Internet's 
growth. 

• We are able to analyze our model, and rigorously prove many of its properties. 

Organization: In the next section we give an overview of the BA model and of the Incremental Edge 
Addition (InEd) model. In Sections |3] and |4] we introduce the Geographic Directed Incremental Edge Ad- 
dition (GeoDInEd) model. Section |5] describes GdTang and the results of our simulations. We conclude 
with Section |6] 

2 Undirected BA Models 

2.1 The pure BA model 

The pure BA model works as follows, (i) Start with a small number (mo) of arbitrarily connected vertices, 
(ii) Incremental vertex addition: at every time step, add a new vertex with m(< mo) edges that connect 
the new vertex to m different vertices already present in the system, (iii) Preferential attachment: the new 
vertex picks its m neighbors randomly, where an existing vertex i, with degree ki, is chosen with probability 

p{h) = h/Ylj k j- 

Since every time step introduces 1 vertex and m edges, it is clear that the average degree of the resulting 
network is rs 2m. 

Observe that new edges are added in batches of m. This is the reason why the pure BA model never 
produces leaves [SW04|, and the basis for the model's inability to produce a dense core. Furthermore, 
empirical evidence ICCG + 02 1 shows that the vast majority of new ASes are born with a degree of 1, and 
not 2 or 3 (which would be necessary to reach the AS graph's average degree of 4.2). 

2.2 The Incremental Edge Addition (InEd) Model 

In an attempt to correct some of the shortcomings of the pure BA model, Bar, Gonen, and Wool suggested 
the InEd model [BGW04|. This model forms the starting point for the current model. 

As in the BA model, the InEd model uses incremental vertex addition, and preferential attachment. The 
main difference between this model and the BA model is the way in which edges are introduced into the 
network. The InEd model works as follows: (i) Start with mo nodes, (ii) At each time step add a new node, 
and m edges. One edge connects the new node to nodes that are already present. An existing vertex i, with 
degree k{, is chosen with probability p[ki) = h/ kj. (That is, p(ki) is linear in ki, as in the BA model). 
The remaining m — 1 edges connect existing nodes: one endpoint of each edge is uniformly chosen, and the 
other endpoint is connected preferentially, choosing a node i with the probability p{ki) as defined above. 

The authors show that the InEd model produces a realistic number of leaves, and better dense-cores and 
power-law exponents than the pure BA model. 
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3 The Directed Incremental Edge Addition (DInEd) Model 



For ease of exposition, in this section we describe our model with no reference to geography, and refer to it 
as the DInEd model. In the next section we expand the model to incorporate a notion of geography. 

The DInEd model is based on the following basic concept: the purpose of growing edges is to improve 
the connectivity of the graph. A customer pays its provider for transit services. As a result a provider with 
many customers is motivated to be connected to other providers that are already well connected. Thus, a 
node is more likely to grow edges if its in-degree is large, and a node with a large out-degree is more likely 
to be chosen as an endpoint of an edge. 

In addition to the customer-provider edges, we also consider symmetric peer-to-peer relations. We 
model peer-to-peer relations as anti-parallel directed edges that connect two nodes. In this section we give 
our model's definition, analyze its degree distribution and prove that it is close to a power-law distribution. 
We also analyze the expected number of leaves. 

3.1 Model Definition 

The basic setup in the DInEd model is the same as in the InEd model, with the important difference that the 
we get a directed graph: We start with mo nodes. At each time step we add a new node, and m directed 
edges. The edges are added in the following way: 

1 . One edge connects the new node v as a customer to some node that is already present. The edge is 
directed from v to the chosen node. An existing vertex i, with out-degree y,, is chosen as a provider 
for node v with probability p(yi) = yij Ylj Uj- 

2. The remaining m — 1 edges connect existing nodes. The outgoing (customer) endpoint of each edge 
is chosen preferentially, choosing a node i with in-degree k{ with probability p(ki) = ki/Y^j kj. 
The incoming (provider) endpoint is also connected preferentially, choosing a node i with probability 
p{yi) as before. Note that a node's motivation to originate another outbound link is proportional to 
the number of downstream customers it already has. 

3. With probability p, each of the added edges, after choosing its endpoints, will be an undirected edge, 
modeled as two anti-parallel directed edges, (p is a parameter of our model). Thus, the new node is 
always added with an out-degree of 1, but its in-degree can be either (with probability 1 — p), or 1 
(with probability p). 

Note that, unlike the models of Krapivsky et al. [KRR01 1, Bollobas et al. [B BCR03I and Aiello et al. 
IACL01I . a node's desirability as a provider depends on its out degree, i.e., on the level of connectivity it 
can provide to its downstream customers. 

3.2 Power Law Analysis 

In this section we show that the DInEd model produces a power-law degree distribution. We analyze our 
model using the "mean field" methods in Barabasi- Albert [BA99|. Let ki(t) denote node i's in-degree at 
time t, and let yi(t) denote out-degree at time t. As in [BA99|, we assume that k{ and yi change in a 
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continuous manner, so fcj and j/j can be interpreted as the average degree of node i, and the probabilities 
p(ki) (respectively p(yi)) can be interpreted as the rate at which fcj (respectively yi) changes. 

Theorem 3.1 In the DInEd model, 



1. Pr[ki{t) = k] oc k 

2. Pi[y i (t) = y}cxy- {1+ ^ ) , 



where Ai = p ^™(i+p) A and A = vV + 4m(m - 1). 

We prove the theorem using the following lemma. 
Lemma 3.2 Let t{ be the time at which node i was added to the system. Then 

y t (t) = G (£) ^ + (G — 2D A) (L^ , (2) 

where A 2 = P( i2(i%) A > B = 2 (! +P) m ~P 2 >C = B/A, D = j^fi, and G = DC + 1/2 + DA. 

Proof: At time t the sum of the in-degrees is mt(l + p), and also the sum of the out-degrees is mt(l + p). 
The change in an existing node's in-degree is influenced by the probability of it being chosen preferentially 
depending on its out-degree, for each of the m added edges, and the probability of it being chosen preferen- 
tially depending on its in-degree, for each of the m — 1 added edges, multiplied by the probability having 
the anti-parallel edge. This gives us the following differential equation 



d -h = m . y -i + V ( m - 1) • ^ = ^ + P{m ~ l) ■ ^ (3) 

8t mt(l+p) P{ ' mt(l+p) t(l+p) m(l+p) t K) 

The change in an existing node's out-degree is influenced by the probability of it being chosen preferen- 
tially depending on its in-degree, for each of the m — 1 added edges, and the probability of it being chosen 
preferentially depending on its out-degree, for each of the m added edges, multiplied by the probability 
having the anti-parallel edge. This gives 



^=pm-—^— + (m-l).^-^ = ^. y -l + ^^.^ (4) 
ot mt(l+p) mt(l+p) l+p t m{l+p) t 

We ignore the changes in degrees that occur during the time step. Thus we get the following system of 
differential equations: 

dh = 1 _ Vi pim - 1) _ \ 
dt l+p' t m(l+p) t 

d Vi = P . Vi + m ~ l . h (6) 
dt l+p t m(l+p) t 
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Since a node enters the network as a customer with a single provider with probability 1—p, and with a single 
peer-to-peer arrangement with probability p, the initial conditions for node i are k(ti) = p, and y(U) = 1. 
Solving for ki(t) and yi(t) proves the Lemma. 

I 

Corollary 3.3 The expected maximal in-degree and maximal out-degree in the DInEd model obey 



hit) 



2 2 

Ai i f r* o n A \ J.A2 



yi (t) = Gt M + {G-2DA)t 
Proof: By setting t j = 1 in Lemma l3~2l we get the result. I 

Proof of Theorem l3.lt We bound the probability that a node has an in-degree ki(t) which is smaller than 

k, using Lemma l3~2l Note that since m > 1 we have that p(2m — 1) — A < for p < 1, and therefore for 

A2 / . \ A2 



t— >oo 



0. (If p = 1 than A2 = 0, so in this case 



p < 1 we have A2 < 0, and thus 

Therefore, we can ignore the terms involving A2 in (Q and © and get 



is constant). 



ki(t) 



C + p (t_ 



(V) 



Vi(t) « G 



Now, by standard manipulations we get 



Pi[ki(t) < k] » 1 



2/c 



(8) 



Thus 



Pr [fci(t) = fc] 



_9_ 

5fc 



C + P 
2/c 



l/Ai 



1 (C + p 



1 , - 1- 



This completes the first part of the theorem, regarding the in-degree fcj. In the same manner we prove 
the second part of the theorem, regarding the out-degree j/j. From Lemma 1X2*1 we have that 



Pr[yi(t)<y] wl 



Therefore 



Theorem 13 . 1 1 shows that the DInEd model produces a power-law distribution in both the in-degree and 
out-degree. Note that the power-law exponent for in-degree and out-degree is the same. For Internet pa- 
rameters we need m 2.11, [BGW04], and p = 0.07 (we shall see in Section^] that setting p to 0.07 
gives approximately 8% peer-to-peer arrangements as reported by Gao [GaoOl]). Using these values, we 
calculate a predicted power-law exponent of 7 = 1 + j- = 2.37; This is quite close to the real value of 
7 « 2.22 [SFFF03]. Certainly this is a closer fit to reality than the fit achieved by earlier works ([BGW04|, 
[BA99|), which showed power-law exponents of 7 = 2.83, and 7 = 3 respectively. The earlier work of 
[BT02| can achieve any value of 7 > 1 through proper choice of the parameters of their GLP model. The 
work of [KRR01 1 gives different power-laws for the in-degree and out-degree. For the in-degree the model 
of [KRR01 1 gives 7 = 2.1, and for the out-degree 7 = 2.7. 



3.3 Analysis of the Expected Number of Leaves 

Note that in the DInEd model a leaf is a node with an in-degree of 0, and an out-degree of 1, and that nodes 
stait as leaves with probability 1 — p. We now compute the probability that a node that entered at time t i will 
remain a leaf at time n, and compute the expected number of leaves in the system at time n. In this section 
we do not use mean-field methods, and show a combinatorial argument. 

Let Vi be the node that entered at time ti, and let in-deg n (vj) be the in-degree of V{ at time n, and 
out-deg n (tij) be the out-degree of vi after time n. 

Theorem 3.4 In the DInEd model, E[#leaves] « n ■ £M|Klzg2 . 

Proof: Let e\ be the event that vi is not chosen as a provider — not as a node connected to a new node, and 
not as an endpoint of one of the m — 1 new edges, in all of the times iffi, n. Let e<i be the event that v\ 
is not chosen as a customer at times ti + 1, n. Let es be the event that Vi starts as a leaf. Then 

Pr[in-deg n (ui) = 0, out-deg n (^) = 1] = Pr[e 5 ] • Pr[ei] • Pr[e 2 ]. (9) 



Note that v% cannot be chosen as an incoming endpoint of one of the added p(m — 1) edges in any round 
if it hasn't been chosen earlier as a provider of the anti-parallel edge, and vise- versa. Let us first examine the 
event e\. At time j the expected number of edges in the network is mj(l + p). Therefore the expected sum 
of the in-degrees at time j is mj(l +p) and the expected sum of the out-degrees at time j is mj(l +p). We 
assume that up to time j the in-degree of Vi is 0, and its out-degree is 1 . Let e% be the event that one choice 
during step j + 1 missed Vi, and let e\ be the event that all the choices made during time step j + 1 missed 
Vi. Thus, 

Pr[e 3 ] = 1 ' 



mj(l + p) 

We neglect the fact that between time j and time j + 1 more edges are added (so the sum of degrees grows 
slightly), so we have 

Pr[e 4 ] ~ f 1 ' 



mj(l + p) 



and therefore 



j=U+i 




i+p 
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As long as the in-degree of a leaf is 0, it will never be chosen as a customer on a new link. Therefore, 
for the event e 2 we have that 

Pr[e 2 ] = l. (11) 

For the event we have that 

Pr[e 5 ] = l-p. (12) 
Hence, from ( fTUl . dTH and (TL2l we get that 

i 

Pr[in-deg n ( Wi ) = 0, out-deg n (^) = 1] « (1 - p) 1+ " , (13) 



and 



w\4U l-V^/Vi ^M 1 ^ (l+p)(l-p) 
£[#Zea<H«^ I (l- p )^-j \=n — 

ti = 1 \ / 



4 The Directed Incremental Edge Addition with Geography 

In this section we introduce the full "Geographic Directed Incremental Edge Addition" (GeoDInEd). We 
generalize the DInEd model in the following way: We define I geographic regions, and a pre-determined 
distribution Pj for all 1 < j < I. Every node is born into a geographic region. The region is selected 
randomly according to the distribution Pj. We use these regional definitions to influence the nodes' choices 
of peers, and give preference to regional peering arrangements, in which both peers are in the same region. 

As in Section|3j we give the model's definition, analyze its degree distribution, prove that it has a power- 
law distribution, and analyze its expected number of leaves. We show that the GeoDInEd model gives 
exactly the same results as the DInEd models in terms of the power-law exponent and the expected number 
of leaves, for any regional distribution Pj. However, our simulations show that the GeoDInEd model enjoys 
a significantly improved clustering behavior, on both a global and regional level. 



4.1 Model Definition 

In the GeoDInEd model, at each time step we add a new node and associate it with a geographic region 
according to a pre-determined distribution Pj for 1 < j < I, where / is the number of geographic regions. 
As in the DInEd model, we add m edges in each step: one connecting the new edge, and m — 1 connecting 
existing nodes. Let < a < 1 be a locality parameter, indicating the probability of an edge to be a local 
(regional) edge. The edges are added according to the same process used in the DInEd model, with the 
following differences: 

1. The first edge always connects the new node to local nodes that are already present, i.e., to nodes in 
its region. 3 

2. The remaining m — 1 edges connect existing nodes in the following manner: 

3 In the analysis we ignore the case of the first node born in a region — which obviously has to connect via a global edge. This 
detail is addressed in the GdTang network generator. 
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(a) With probability a the edge is local. Thus its endpoints are restricted to be in the region of 
the new node. Subject to this restriction, the endpoints are chosen with the same preferential 
attachment rules as in the DInEd model. 

(b) With probability 1 — a the edge is global. Therefore its endpoints are preferentially chosen, 
as in the DInEd model. Note that a "global" edge may end up being local, since the choice of 
endpoints is not constrained. 

Our analysis shows that the GeoDInEd model produces a power-law degree distribution with an accurate 
power-law exponent 7, for the global degrees as well as for the local degrees, and that 7 is exactly the same 
as that of DInEd for any regional distribution Pj and any value of a. However, our simulations show that 
a has a strong effect on the clustering structure of the network: Our model is the first to produce regional 
cores. 

4.2 Power Law Analysis 

We first prove that the GeoDInEd model produces a power-law distribution for the global degrees, and then 
show that the GeoDInEd model produces a power-law distribution for the local degrees. As before, let ki(t) 
denote node i's global in-degree at time t, and let t/i(t) denote the global out-degree at time t. Let k\{t) 
denote node i's local in-degree at time t, and let y\{t) denote the local out-degree at time t. 

Theorem 4.1 In the GeoDInEd model, 

1. Pr[fcj(t) = k] oc k~ {1+ ^\ 

2. Pr [ yi (t) = y] oc y~ {1+ ^\ 

where \\ = P ^Z(i+p) A > and A = vV + 4m ( m _ 1)- 

We prove the theorem using the following sequence of lemmas. 

Lemma 4.2 Let Ij and Oj be the expected sums of in-degrees and out-degrees of nodes in region j, respec- 
tively. Then 

Ij = Oj = Pj(l +p)mt 

Proof: The change in Ij is influenced by the probability that the new node belongs to region j, the probability 
that a node in j is chosen preferentially as an end-point of a local edge, the probability that a node in j is 
chosen preferentially as an end-point of a global edge, and the probability of having an anti-parallel edge, 
for each of the added m edges. This gives us the following differential equation 

§ = F j( l + p) +PM™ ~ D(l +P) + (1 " «)(» " 1) • + p^— j ) 
In the same manner we get a similar differential equation for Oj 

% = P,d + „) + P,c,(m - Dd +P) + (1 - a)(m - 1) • (^rt 
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Thus we get the following system of differential equations: 

dL , , w (l-a)(m-l) fO j I 



at 



P i( l +p) + p ia(m _ i )( i +p) + 1 -^i-^ ' • +Pf J (14) 



= + P ) + P,a(m - 1)(1 +P ) + (1 ~ a )}± 7 ~ • f i + f£) d5) 
at m(l + p) \t t J 

Solving this system of differential equations we get 

Ij=Oj. (16) 

Substituting (fTBT l in (fT4l we get the equation 

(9/ I 
-sf = P,-(l + p) + PMm - 1)(1 +p) + (1 - a)(m - 1) ■ (17) 
at mi 



with the solution 



_ P J -(l+ P )(l + «(m-l)) 
^ l-(l- Q )(m-l)/m f -W + W mt 



This completes the Lemma. | 

Lemma 4.3 Le? t j fee the time at which node i was added to the system. Then 



, C + p ft\ Xl -C + p f t\ M 



= Gl-l + (G-2£>4)I-1 , (19) 

A 2 = Pi 2Mi+ P ) A ' B = 2 (! +P) m ~P 2 ,C = B/A, D = and G = DC + 1/2 + DA. 

Proof: Suppose node i belongs to region j. From Lemma l4~2l at time t the expected sum of the in-degrees 
of nodes in region j is Pj(l+p)mt, and the expected sum of the out-degrees is Pj(l +p)mt. The change in 
an existing node's in-degree is influenced by the probability of it being chosen preferentially depending on 
its global out-degree as an end-point of a the local edge connecting the new node, the probability of it being 
chosen preferentially depending on its global out-degree as an end-point of a local edge and as an end-point 
of a global edge, for each of the added m — 1 edges, and the probability of it being chosen preferentially 
depending on its global in-degree as an end-point of a local edge and as an end-point of a global edge, for 
each of the added m — 1 edges, multiplied by the probability having the anti-parallel edge. This gives us the 
following differential equation: 



dki d y% , jDi -n ( Vi+P k i \ ,/, w -n ( Vi+Ph " 

— = P« ■ — ; —. h aPAm — 1) • — r ; + (1 — a)(m — 1) ■ r 

dt J Pj(l+p)mt JK ' \Pj(l+p)mtJ y A ' \{l+p)mt / 

Conveniently, Pj cancels out, and after rearranging we get: 

dk i Vi , / ,n / Vi+Ph \ , w .v ( Vi+Ph 

+ a{m — 1) ■ I — r — + (1 — a)(m — 1) 



dt (l+p)mt \{\ + p)mt J \(l+p)mt 
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Therefore a vanishes, and we obtain exactly the differential equation ©. 
Similarly, for the global out-degree we have 




which is exactly equal to equation ©. 

Thus we get the same system of differential equations as in the DInEd model, for any distribution Pj 
and any value a. This completes the Lemma. | 

Proof of Theorem l4.H Using Lemma Pi31 the proof follows from the proof of Theorem l3.ll | 

The next Theorem 14.41 shows that the GeoDInEd model produces exactly the same power-law distribu- 
tion not only globally, but also within each region. 

Theorem 4.4 In the GeoDInEd model, 



Proof omitted. 

4.3 Analysis of the Expected Number of Leaves 

As in the DInEd model, a leaf is a node with an in-degree 0, and an out-degree 1, and nodes start as leaves 
with probability I — p. The following theorem shows that the presence of the locality parameter does not 
alter the number of leaves (as compared to the DInEd model): 

Theorem 4.5 In the GeoDInEd model, E[#leaves) « 

Proof omitted. Thus we got the same result as in the DInEd model. 

5 Implementation 

We implemented the GeoDInEd model as a synthetic network generator. GdTang is freely available from 
the authors. GdTang accepts the following parameters: 

1. The desired number of vertices (n). 

2. The average number of edges added in each step — possible fractional (to). 

3. The regional distribution P; for I different geographic regions. 

4. The locality parameter a, indicating the probability of an edge to be a local (regional) edge, as de- 
scribed in Section |4] 
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Region # 


Region ID 


Frequency 


1 


NAFTA 


55.45% 


2 


EMEA 


18.53% 


3 


AP 


8.05% 


4 


Latin America 


2.96% 


5-26 


Miscellaneous 


0.09%-0.45% 



Table 1: Region Size Distribution. 



5. A parameter p, which describes the probability of any new edge to be a peer-to-peer (double sided) 
edge. 

Setting the number of geographic regions to I = 1 causes GdTang to use the basic DInEd model and 
similarly, setting the locality parameter to a = approximates the DInEd model. 

For the regional distribution, we used the AS per-country distribution data, collected by the Caida 
project, [CAI04| in the following way: We defined 4 large geographic regions, that include multiple coun- 
tries: NAFTA (USA, Canada and Mexico), EMEA (Europe, Middle-East and Africa), AP (Asia-Pacific: 
South-east Asia and Australia) and Latin America. Each other country formed it's own geographic region. 
For each region, we defined it's frequency as the sum of the frequencies of ASes located in the region. After 
processing the raw data, we obtained the distribution shown in Tabled 

We used GdTang to generate synthetic topologies with Internet-like parameters. In all the experiments 
we used n = 15, 000 and m = 2.11, which match the values reported in IS W04I . 

5.1 The Fraction of Symmetric Peering Arrangements 

Recall that our model uses the parameter p, for the probability of a peering arrangement to be symmetric. 
However, even when p = 0, the model has some probability of producing anti-parallel edges. Therefore, 
to best match reality, we need to calibrate the parameter p so that total number of symmetric peering ar- 
rangements is realistic. Gao IGaoOH shows that about 8% of the peering arrangements have a symmetric 
peer-to-peer nature. Fig. fjshows the fraction of peer-to-peer edges as function of the locality parameter a 
for p = 0, 0.04, 0.07, 0.1. The figure shows that our model naturally produces 2-3% symmetric edges, and 
that the effect of the p parameter is roughly additive. So with p = 0.07 the model produces 8.53-9.79% sym- 
metric peering arrangements. All the results in the following experiments are based on topologies produced 
by GdTang for p = 0.07. 

5.2 Dense Core Analysis 

Our next experiment was designed to test the effects of the locality parameter a. Recall that a provably has 
no effect on the degree distribution (recall Theorem l4.lt . However, we expect a to have a strong effect on 
the clustering structure. Therefore, we generated networks with varying values of a and computed the sizes 
of all the dense cores of over 6 nodes in each network. We sorted the cores in decreasing order of size, from 
biggest to smallest. 

In order to find the Dense Core in the networks, we used the Dense A;-Subgraph (DkS) algorithms 
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Figure 1: Fraction of symmetric peering arrangements as a function of locality parameter a for various 
values of p 
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Figure 2: Sizes of the clusters as a function of a for p = 0.07 



of [FKP01 SW04]. These algorithms search for the densest clusters (sub-graph) with a density above a 
threshold: we used a value of 70%. Fig. |2] shows the sizes of the clusters found by the algorithm as a 
function of the locality parameter a. Each point on the curve is the average over 10 random networks 
generated with the same parameters. 

Sagie and Wool [ SW04 ] have shown that the real AS graph has 5 dense clusters with density above 70%. 
These clusters are of sizes 43,14,8,8,7. 

Fig. I2 shows that a large Dense Core exists for all values of a. However, we see that increasing a 
produces additional cores, whose size and number grow with a. A detailed inspection of the raw data shows 
that 98% of these secondary cores are fully contained in one of the regions, i.e., they model the so-called 
Regional Cores. We believe that our model is the first to exhibit such regional cores. 

Note that the large Dense Core that our model produces is slightly smaller that the size of 43, measured 
by [SW04| and that Dense Core shrinks somewhat when a grows. The Dense Core is not confined to a 
single region, so a higher locality parameter reduces the tendency of core members to form edges with other 
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Internet — •— 
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Figure 3: The CCDF of the Internet's AS-graph degree distribution, shown with the CCDF of the syn- 
thetic networks ("world"), and with 4 CCDFr curves for the largest regions we defined, on a log-log scale. 




12 3 

Number of AS hops inflated 



Figure 4: The path inflation percentage per tier in a synthetic graph, generated with a = 0.5 and p = 0.07. 



core members thereby making the core less dense. 

The figure shows that the GdTang networks have realistic dense and regional cores with the locality 
parameter a around a = 0.5: i.e, each new edge has a 50% probability of being a local (regional) edge. 

5.3 Power Law Analysis 

Fig.EJshows the Complementary Cumulative Density Function for regional distribution (CCDFr) 4 of the 
degree distribution in the Internet's AS-graph and in the GdTang generated synthetic networks. For the 
synthetic networks, each CCDFr curve is the average taken over the 10 randomly generated networks. 

The figure shows the well-known power-law of the Internet AS graph, with a CCDF exponent of rj = 
1.17. The figure also shows that the GeoDInEd model has a fairly accurate power-law exponent of rj = 1.37. 
Note that this is precisely the value predicted in Theorem 13. H — thus validating the estimations used in the 

4 For any distribution of degrees in any given region R, CCDFn(k) — Pi[deg„(v) > k A v 6 R]. Note that if Pv[deg n (v) = 
k] <x fc" 7 then CCDF(k) <x k~ v = fc 1 " 7 . 
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proofs. 

The data shows that, as predicted by Theorem l4.5l the model brings the number of leaves in the network 
to 49%, while the number of leaves in the AS-graph is 30%. Thus it seems that the GeoDInEd model 
produces too many leaves. Note, though, that the number of leaves in the AS-graph is slightly too low for 
the power-law that the degree distribution exhibits: Fig.[3]shows that the AS-graph's CCDF has a "bump" for 
degree values 1^. Thus we speculate that an additional process is taking place and affecting the frequency 
of low-connectivity nodes. Exploring and modeling this phenomenon is left for future work. 

5.4 Path inflation effects 

Gao and Wang [GW02| discuss path inflation in the Internet's AS graph due to the so-called No- Valley 
routing policy. They reported that for tier-1 ISPs, 20% of paths exhibited path inflation. For tier-2 ISPs 
they found 55% path inflation and for tier-3 ISPs they found 20% path inflation. In order to compare these 
findings to the behavior on our synthetic networks, we define the No-Valley routing policy as follows: 

No-Valley Routing Policy: an AS does not provide transit services between any two of its providers. 
That is, in an AS path (m, U2----u n ) if itj+i) has a provider-customer relationship, then (uj,Uj + i) must 
have a provider-customer relationship for any i < j < n. We divided the AS-es into tiers based on node 
degrees in the following way : 

Tierl - nodes with Deg(node) > 100 

Tier2 - nodes with 20 < Deg(node) < 100 

Tier3 - nodes with 3 < Deg(node) < 20 

We adopted the algorithm proposed by Gao and Wang [GW02| for computing the shortest AS path 
among all no-valley paths, using our definition of No- Valley routing policy and used it to calculate path 
inflation within the three tiers. Fig. |4] shows that the results we obtained are fairly close to those shown by 
Gao and Wang [GW02|: 11% path inflation for tier-1, 22% path inflation for tier-2, and 23% infaltion for 
tier-3. 

6 Conclusions and Future Work 

We have shown that our model, the GeoDInEd model, significantly improves upon previously suggested 
models. Most importantly, our model produces directed graphs, which allow a much more appropriate 
representation of the AS-graph's Customer-Provider peering arrangements, as well as a representation of 
symmetric peer-to-peer arrangements. Besides being more realistic, GeoDInEd even improves upon earlier, 
undirected, models in terms of the (undirected) power-law exponent. Using a simple notion of geography, 
our model shows that different clustering structures can all manifest the same power-law. Moreover, in 
addition to the global dense core, for the first time, our model produces regional dense cores, when peering 
arrangements have a 50% probability of being regional. Our model also exhibits realistic path inflation 
effects. Finally, our model is amenable to mathematical analysis, and is implemented as a freely available 
network generator. 
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